Skip to content

create: implement retries for individual fs files (backport of #7351)#9812

Draft
ThomasWaldmann wants to merge 4 commits into
borgbackup:1.4-maintfrom
ThomasWaldmann:backport-7351-retries-1.4
Draft

create: implement retries for individual fs files (backport of #7351)#9812
ThomasWaldmann wants to merge 4 commits into
borgbackup:1.4-maintfrom
ThomasWaldmann:backport-7351-retries-1.4

Conversation

@ThomasWaldmann

Copy link
Copy Markdown
Member

Backport of #7351 to 1.4-maint.

Retry reading a backup source file when it raises a BackupError / BackupOSError:

  • retry the same file after a sleep (1ms, growing exponentially up to 10s, 10 tries)
  • BackupOSError: if retries don't help, skip the file, log it with E status
  • BackupError (changed while reading): last try backs it up, logged with C status

Works for borg create's builtin fs recursion, --paths-from-command and --paths-from-stdin.

1.4-specific care: checkpoint part files

1.4 has checkpointing that may write .borg_part_N files. A naive retry would re-read the file from the start and write a second set of part files, so concatenating all part files would no longer reproduce the complete file. Therefore a file is not retried once part files have been written for it (ChunksProcessor.last_part_number tracks this) — it is kept as-is (C) or skipped (E) instead.

Permission errors

EPERM/EACCES are not retried (they won't get better) — avoids ~15s of pointless backoff on every unreadable file. This is the post-#7351 follow-up behavior; included here to avoid a regression vs. current 1.4.

Tests

Adds a ChunkerFailing test chunker (--chunker-params=fail,block_size,map) to drive the error handling, and:

  • test_create_erroneous_file (from create: implement retries for individual fs files #7351) — file is read on a later retry
  • test_create_no_permission_file — permission error is not retried, file skipped
  • test_create_erroneous_file_with_part_filesnew, 1.4-specific — a file with part files is not retried (no master equivalent, since master 2.0 has no part files)

🤖 Generated with Claude Code

…ckup#7351)

Backport of borgbackup#7351 to 1.4-maint. Retry reading a backup source file when
it raises a BackupError / BackupOSError (e.g. transient I/O error, or the
file changed while we read it):
- retry the same file after some sleep time (1ms, growing exponentially up to 10s, 10 tries)
- BackupOSError: if retries do not help, skip the file, log it with "E" status
- BackupError (changed while reading): last try backs it up, logged with "C" status

Works for borg create's builtin fs recursion, --paths-from-command and --paths-from-stdin.

1.4-specific: do NOT retry a file once checkpoint part files (.borg_part_N)
have been written for it. Re-reading it from the start would create duplicate /
inconsistent part files (concatenating all part files would then no longer yield
the complete file). ChunksProcessor.last_part_number tracks this; such a file is
kept as-is ("C") or skipped ("E") instead of being retried.

Permission errors (EPERM/EACCES) are not retried (they won't get better),
so we don't waste ~15s of backoff on every unreadable file.

Adds a ChunkerFailing test chunker (chunker-params=fail,block_size,map) to drive
the error handling, and tests:
- test_create_erroneous_file: file is read on a later retry
- test_create_no_permission_file: permission error is not retried, file skipped
- test_create_erroneous_file_with_part_files: file with part files is not retried

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 68.65672% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.30%. Comparing base (b09bbed) to head (7097ea9).
⚠️ Report is 4 commits behind head on 1.4-maint.

Files with missing lines Patch % Lines
src/borg/archiver.py 65.38% 13 Missing and 5 partials ⚠️
src/borg/archive.py 70.00% 3 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##           1.4-maint    #9812      +/-   ##
=============================================
+ Coverage      82.04%   82.30%   +0.26%     
=============================================
  Files             38       38              
  Lines          11384    11411      +27     
  Branches        1794     1799       +5     
=============================================
+ Hits            9340     9392      +52     
+ Misses          1460     1433      -27     
- Partials         584      586       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

ThomasWaldmann and others added 3 commits June 25, 2026 23:10
Co-authored-by: Junie <junie@jetbrains.com>
Co-authored-by: Junie <junie@jetbrains.com>
The changed-file retry path already rolls back the chunks it added before
re-reading. Do the same when a read error (BackupOSError) interrupts reading
a file: the chunks added since the last checkpoint (item.chunks[from_chunk:])
are not referenced by any committed part item, so without rolling them back
they leak as an inflated chunk refcount (or, if content also shifted, an
orphan chunk).

The rollback lives in process_file_chunks, where from_chunk is known: chunks
before from_chunk are referenced by part files we already wrote and must be
kept; only the uncommitted tail is decref'd.

Adds test_create_erroneous_file_read_retry_rolls_back_chunks: the re-read
chunks dedup to the same ids, so the leak shows up as a too-high refcount
(not an orphan id) - the test checks refcounts directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ThomasWaldmann ThomasWaldmann force-pushed the backport-7351-retries-1.4 branch from f5aba18 to 7133066 Compare June 25, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant